SUMMARY: This research project compares the age-specific Healthcare Access and Quality (HAQ) indices among US States for 2010 and 2016. The HAQ index is based on amenable mortality (a measure of deaths that could have been prevented with timely and effective healthcare).
RESEARCH QUESTION: Does the Healthcare Access and Quality index vary across ages among US states (are they comparable or incompatible)? Additionally, is there any observed association between private or public insurance coverage and higher indices?
GOAL: Compare US States with varying insurance coverage (a combination of private and public coverage).
#Load libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(olsrr)
##
## Attaching package: 'olsrr'
##
## The following object is masked from 'package:MASS':
##
## cement
##
## The following object is masked from 'package:datasets':
##
## rivers
library(leaps)
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(stats)
library(sandwich)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:MASS':
##
## select
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
#Import dataset
#---Results are available for 50 states + District of Columbia.
#---Each state has 2 observations (corresponding to the years 2010 and 2016) for each age group.
#---There are 16 age groups, which encompass ages from 0-11 months (group 1) to 70-74 years (group 16). Each group, except for group 1, contains 4 individual age numbers (1-4,5-9,etc.)
setwd("/Users/clarapineda/Desktop/CAPSTONE - STATS/CAPSTONE - STATS")
healthcare.data <- read.csv("IHME_HAQ_INDEX_1990_2016_US_STATES_Y2021M11D17.CSV", header = TRUE, sep = ",")
#Now, the imported dataset will be arranged by year in order to account for differences over this 6-year period.
clean_healthcare_data <- healthcare.data %>% arrange(year_id)
#Now, we will create categories for all age groups and using this categories to create a new variable: AGE1.
#--- Final categories: Children, Adolescents, Young Adults, Adults, Seniors.
categories <- c("Children","Adolescents", "Young Adults","Adults","Seniors")
clean_healthcare_data <- clean_healthcare_data %>% mutate(age1 = case_when( age_group_id >= 4 & age_group_id<= 6 ~categories[1], age_group_id >= 7 & age_group_id <= 9 ~categories[2], age_group_id >= 10 & age_group_id <= 12 ~ categories[3], age_group_id >= 13 & age_group_id <= 16 ~categories[4], age_group_id >= 17 & age_group_id <=19 ~ categories[5]))
The next sections will explore different multivariate regression models for 1 State, 5 states, and 51 states, respectively, considering the following variable assignment:
In addition, we have found that the final model that best describes the data is the following:
\(Y_i= β_0+β_1X_{1i}+β_2X_{2i}+β_3X_{3i}+Σ_{a=2}^{5}[β_{0a}d_{ai}+𝛾_{1a}d_{ai}X_{1i}+𝛾_{2a}d_{ai}X_{2i}+𝛾_{3a}d_{ai}X_{3i}]+ε_i\)
where \(i=1,...,n\), \(a=2,...,5\), \(x_i,x_2,x_3\) represent income, public, and private, and \(ε_i\) follows N ~ \((0,𝜎^2)\)
-> The category “Adolescents” has been chosen as the reference age category, so we create dummy variables for the remaining categories: \(d_{1i}=1\) if age1\(_i\) = Children, 0 otherwise. Similarly, \(d_{3i},d_{4i},d_{5i}\) for Young Adults, Adults, and Seniors, respectively.
-> The intercept \(β_{0a}\) is is the expected difference in HAQ index between each age group \(a\) and Adolescents when all continuous predictors are at their reference value (i.e., median-income is at its center, and the private and public coverage rates are at their centers).
-> The extra‑slope \(𝛾_{ia}\) simply explains “how much steeper or flatter this age group’s line is compared with Adolescents”.
Interpretation of the model: Start with a baseline line (intercept) and four baseline slopes \(β_{1-4}\). Then, let every age group have its own intercept shift and its own extra slope for each continuous predictor (income, private, public).
1 state
#Initial model - Indiana
indiana_2010 <- clean_healthcare_data %>% filter(location_name == "Indiana", year_id == 2010)
indiana_2016 <- clean_healthcare_data %>% filter(location_name == "Indiana", year_id == 2016)
first_model_2010 <- lm(index ~ age1*(median_income + Total + Private + Public), data = indiana_2010)
first_model_2016 <- lm(index~ age1*(median_income + Total + Private + Public), data = indiana_2016)
#Plotting the models (by year, the same state)
ggplot(indiana_2010, aes(x = age_group_id, y = index, color = location_name, group = location_name)) + geom_line() + geom_point() + labs(title = "Index by Age Group (2010) for Indiana", x = "Age Group", y = "Index")
ggplot(indiana_2016, aes(x = age_group_id, y = index, color = location_name, group = location_name)) + geom_line() + geom_point() + labs(title = "Index by Age Group (2016) for Indiana", x = "Age Group", y = "Index")
Diagnostics for the initial model
#---------------------- NOTES ON DIAGNOSTICS -------------------------:
#-- The first model (for both years) contains too few observations and is therefore not significant to the overarching goal, however, we wanted to make the interpretation of HAQ indices easy to digest by displaying the distribution for a single state.
#-- We do not perform the Shapiro Test for normality for this first model because there are too few observations and is therefore not significant to the overarching goal.
#-- We do not perform the Breusch-Pagan test for Heteroscedasticity for the same reason as above.
# -- We do not perform the Backward Selection Procedure because we have more parameters than observations and this would not yield a comprehensive result.
#checking the normality condition in the residuals
par(mfrow = c(1, 2))
qqnorm(residuals(first_model_2010)) #QQ-Plot
qqnorm(residuals(first_model_2016))
plot(first_model_2010, which = 1) # Residuals vs. Fitted
plot(first_model_2016, which = 1)
5 states
#Choosing 5 states for the second model
selected_states <- c("Indiana", "Texas", "New York", "Mississippi", "Idaho") #From every major region in the U.S.
severalStates_2010 <- clean_healthcare_data %>% dplyr::filter(location_name %in% selected_states, year_id == 2010)
severalStates_2016 <- clean_healthcare_data %>% dplyr::filter(location_name %in% selected_states, year_id == 2016)
#Second model:2010
#1
second_model_2010 <- lm(index ~ age1*(median_income + Total + Private + Public), data = severalStates_2010) #Since this model previously showed heteroscedasticity, we had to re-model by adding the weighted least squares (w)
abs_resid <- abs(residuals(second_model_2010))
var_model <- lm(abs_resid ~ median_income + Total + Private + Public, data = severalStates_2010) #these are the predictors we think were causing the heteroscedasticity
fitted_var <- fitted(var_model) #Construct weights w_i = 1 / (fitted variance)^2
w <- 1 / (fitted_var^2)
#2: with weighted-least-squares to remove the heteroscedasticity
second_model_2010_wls <- lm(index ~ age1*(median_income + Total + Private + Public), data = severalStates_2010, weights = w)
#Second model:2016
#1
second_model_2016 <- lm(index ~ age1*(median_income + Total + Private + Public), data = severalStates_2016)
abs_resid2 <- abs(residuals(second_model_2016))
var_model2 <- lm(abs_resid2 ~ median_income + Total + Private + Public, data = severalStates_2016)
fitted_var2 <- fitted(var_model2)
w2 <- 1 / (fitted_var2^2)
#2: with weighted-least-squares to remove the heteroscedasticity
second_model_2016_wls <- lm(index ~ age1*(median_income + Total + Private + Public), data = severalStates_2016, weights = w2)
#Plotting the models (by year, the same group of states)
ggplot(severalStates_2010, aes(x = age_group_id, y = index, color = location_name, group = location_name)) + geom_line() + geom_point() + labs(title = "Index by Age Group (2010) for Various States", x = "Age Group", y = "Index")
ggplot(severalStates_2016, aes(x = age_group_id, y = index, color = location_name, group = location_name)) + geom_line() + geom_point() + labs(title = "Index by Age Group (2016) for Various States", x = "Age Group", y = "Index")
Diagnostics for the initial second model (with BACKWARD SELECTION PROCEDURE and ANOVA)
#Shapiro test for normality
shapiro.test(residuals(second_model_2010_wls)) #Since the p-value is greater than 0.05, we accept (fail to reject) the null hypothesis that the data follows a normal distribution (normality condition passed)
##
## Shapiro-Wilk normality test
##
## data: residuals(second_model_2010_wls)
## W = 0.98858, p-value = 0.7033
shapiro.test(residuals(second_model_2016_wls)) #Normality condition passed as well
##
## Shapiro-Wilk normality test
##
## data: residuals(second_model_2016_wls)
## W = 0.97694, p-value = 0.1573
#checking the normality condition in the residuals
par(mfrow = c(1, 2))
qqnorm(residuals(second_model_2010_wls)); qqline(residuals(second_model_2010_wls), col = "red")
qqnorm(residuals(second_model_2016_wls)); qqline(residuals(second_model_2016_wls), col = "red")
boxplot(residuals(second_model_2010_wls))
boxplot(residuals(second_model_2016_wls))
boxcox(second_model_2010_wls,lambda = seq(-5,2)) #The result indicates a log transformation is NOT necessary since it contains 1
boxcox(second_model_2016_wls,lambda = seq(-5,2)) #Same as above
cook <- cooks.distance(second_model_2010_wls)
cook1 <- cooks.distance(second_model_2016_wls)
#Breusch-Pagan test to detect heteroscedasticity.
bptest(second_model_2010_wls, studentize = TRUE)
##
## studentized Breusch-Pagan test
##
## data: second_model_2010_wls
## BP = 7.6386, df = 24, p-value = 0.9994
bptest(second_model_2016_wls, studentize = TRUE)
##
## studentized Breusch-Pagan test
##
## data: second_model_2016_wls
## BP = 5.2203, df = 24, p-value = 1
#Residuals vs. Fitted
plot(second_model_2010_wls, which = 1)
plot(second_model_2016_wls, which = 1)
#VIF
vif(second_model_2010_wls, type = 'predictor') #what does this mean?
## Warning in vif.lm(second_model_2010_wls, type = "predictor"): type = 'predictor' is available only for unweighted linear models;
## type = 'terms' will be used
## GVIF Df GVIF^(1/(2*Df))
## age1 8.978463e+10 4 23.39647
## median_income 1.677394e+02 1 12.95143
## Total 3.383868e+03 1 58.17103
## Private 1.551671e+03 1 39.39126
## Public 1.560363e+04 1 124.91447
## age1:median_income 4.733687e+08 4 12.14507
## age1:Total 2.118716e+17 4 146.47351
## age1:Private 7.878766e+15 4 97.06380
## age1:Public 1.964316e+13 4 45.88296
vif(second_model_2016_wls, type = 'predictor') #what does this mean?
## Warning in vif.lm(second_model_2016_wls, type = "predictor"): type = 'predictor' is available only for unweighted linear models;
## type = 'terms' will be used
## GVIF Df GVIF^(1/(2*Df))
## age1 1.062315e+12 4 31.86263
## median_income 1.885567e+02 1 13.73160
## Total 8.135366e+02 1 28.52256
## Private 6.163557e+02 1 24.82651
## Public 1.020117e+04 1 101.00083
## age1:median_income 1.123827e+09 4 13.53124
## age1:Total 3.414766e+16 4 116.59237
## age1:Private 4.271244e+14 4 67.42474
## age1:Public 3.333748e+12 4 36.75925
#Selection procedure
#---- Several States year 2010
backward_selection <- step(second_model_2010_wls, direction = "backward", trace = FALSE) #We run this procedure to obtain a parsimonious, hierarchy-respecting model
summary(backward_selection)
##
## Call:
## lm(formula = index ~ age1 + median_income + Total + Public +
## age1:median_income + age1:Total + age1:Public, data = severalStates_2010,
## weights = w)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -2.3554 -1.0839 -0.1003 0.8416 3.5455
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66.46590 7.60043 8.745 2.66e-12 ***
## age1Adults -8.21338 11.41478 -0.720 0.4746
## age1Children 99.98177 42.25645 2.366 0.0212 *
## age1Seniors -16.61958 27.58532 -0.602 0.5491
## age1Young Adults 6.06630 9.63521 0.630 0.5313
## median_income 0.46938 0.66225 0.709 0.4812
## Total 0.22754 0.11229 2.026 0.0472 *
## Public -0.04320 0.14753 -0.293 0.7707
## age1Adults:median_income -0.15440 0.68957 -0.224 0.8236
## age1Children:median_income 1.44936 1.26860 1.142 0.2578
## age1Seniors:median_income 0.47694 0.72663 0.656 0.5141
## age1Young Adults:median_income 0.48240 0.69700 0.692 0.4915
## age1Adults:Total 0.04621 0.18956 0.244 0.8082
## age1Children:Total -1.24853 0.60466 -2.065 0.0433 *
## age1Seniors:Total -0.17167 0.35737 -0.480 0.6327
## age1Young Adults:Total -0.20755 0.14731 -1.409 0.1640
## age1Adults:Public -0.44482 0.23013 -1.933 0.0580 .
## age1Children:Public -0.06739 0.23377 -0.288 0.7742
## age1Seniors:Public 0.10976 0.16373 0.670 0.5052
## age1Young Adults:Public -0.53977 0.24653 -2.189 0.0325 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.471 on 60 degrees of freedom
## Multiple R-squared: 0.7629, Adjusted R-squared: 0.6878
## F-statistic: 10.16 on 19 and 60 DF, p-value: 1.989e-12
Anova(second_model_2010_wls, type = 3)
## Anova Table (Type III tests)
##
## Response: index
## Sum Sq Df F value Pr(>F)
## (Intercept) 139.801 1 61.8760 1.442e-10 ***
## age1 11.150 4 1.2337 0.3073
## median_income 1.034 1 0.4575 0.5016
## Total 0.046 1 0.0205 0.8866
## Private 0.215 1 0.0953 0.7588
## Public 0.168 1 0.0746 0.7858
## age1:median_income 11.005 4 1.2177 0.3139
## age1:Total 4.995 4 0.5527 0.6978
## age1:Private 4.797 4 0.5308 0.7136
## age1:Public 9.116 4 1.0087 0.4110
## Residuals 124.265 55
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#---- Several States year 2016
backward_selection <- step(second_model_2016_wls, direction = "backward", trace = FALSE)
summary(backward_selection)
##
## Call:
## lm(formula = index ~ age1 * (median_income + Total + Private +
## Public), data = severalStates_2016, weights = w2)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -2.86219 -0.99953 -0.04912 0.86564 3.04003
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 67.95817 11.67276 5.822 3.11e-07 ***
## age1Adults -13.43757 15.31602 -0.877 0.3841
## age1Children 52.82563 86.34633 0.612 0.5432
## age1Seniors -24.13991 33.39460 -0.723 0.4728
## age1Young Adults -12.23685 15.36166 -0.797 0.4291
## median_income 0.85238 0.69677 1.223 0.2264
## Total -0.55861 0.92427 -0.604 0.5481
## Private 0.55715 0.84210 0.662 0.5110
## Public 0.77468 0.83676 0.926 0.3586
## age1Adults:median_income -0.70546 0.71380 -0.988 0.3273
## age1Children:median_income -1.15528 1.58213 -0.730 0.4684
## age1Seniors:median_income -0.02341 0.73704 -0.032 0.9748
## age1Young Adults:median_income 0.20571 0.75083 0.274 0.7851
## age1Adults:Total 2.13522 1.11089 1.922 0.0598 .
## age1Children:Total -1.35350 2.58775 -0.523 0.6030
## age1Seniors:Total 0.57902 0.99900 0.580 0.5646
## age1Young Adults:Total 0.37160 1.21684 0.305 0.7612
## age1Adults:Private -1.79138 1.02460 -1.748 0.0860 .
## age1Children:Private 1.10834 1.93182 0.574 0.5685
## age1Seniors:Private -0.43096 0.85139 -0.506 0.6147
## age1Young Adults:Private -0.25924 1.09870 -0.236 0.8143
## age1Adults:Public -2.18633 1.00509 -2.175 0.0339 *
## age1Children:Public 0.64898 2.01850 0.322 0.7490
## age1Seniors:Public -0.69043 0.83916 -0.823 0.4142
## age1Young Adults:Public -0.80780 1.13254 -0.713 0.4787
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.481 on 55 degrees of freedom
## Multiple R-squared: 0.7668, Adjusted R-squared: 0.665
## F-statistic: 7.534 on 24 and 55 DF, p-value: 3.566e-10
Anova(second_model_2016_wls, type = 3)
## Anova Table (Type III tests)
##
## Response: index
## Sum Sq Df F value Pr(>F)
## (Intercept) 74.355 1 33.8951 3.114e-07 ***
## age1 3.607 4 0.4110 0.79992
## median_income 3.283 1 1.4965 0.22642
## Total 0.801 1 0.3653 0.54807
## Private 0.960 1 0.4377 0.51098
## Public 1.880 1 0.8571 0.35859
## age1:median_income 25.419 4 2.8968 0.03013 *
## age1:Total 15.070 4 1.7175 0.15927
## age1:Private 14.142 4 1.6117 0.18436
## age1:Public 18.430 4 2.1003 0.09315 .
## Residuals 120.653 55
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Final model for the 5 states
#HAQ 2010
#-----Final model according to the backward selection procedure, VIF, and ANOVA-----
final_second_model_2010 <- lm(index ~ age1 + Total + Public + age1:Total + age1:Public, data = severalStates_2010, weights = w)
summary(final_second_model_2010) #Here, the main effect is drawn from "Total", and since this represents the total insurance coverage by age group, it means that age groups with higher HAQ indices had a greater insurance coverage
##
## Call:
## lm(formula = index ~ age1 + Total + Public + age1:Total + age1:Public,
## data = severalStates_2010, weights = w)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -4.0700 -1.0953 0.0119 0.9880 3.5985
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 70.44283 6.21502 11.334 <2e-16 ***
## age1Adults -18.16487 11.21155 -1.620 0.1100
## age1Children 53.64680 41.70651 1.286 0.2029
## age1Seniors -22.42910 32.73508 -0.685 0.4957
## age1Young Adults 4.76103 9.46690 0.503 0.6167
## Total 0.26787 0.11735 2.283 0.0257 *
## Public -0.09649 0.15389 -0.627 0.5329
## age1Adults:Total 0.17896 0.17795 1.006 0.3183
## age1Children:Total -0.49371 0.48690 -1.014 0.3143
## age1Seniors:Total 0.13760 0.40620 0.339 0.7359
## age1Young Adults:Total -0.11053 0.16027 -0.690 0.4929
## age1Adults:Public -0.44359 0.26085 -1.701 0.0938 .
## age1Children:Public -0.24531 0.21686 -1.131 0.2621
## age1Seniors:Public 0.03554 0.16943 0.210 0.8345
## age1Young Adults:Public -0.22907 0.27558 -0.831 0.4089
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.783 on 65 degrees of freedom
## Multiple R-squared: 0.6225, Adjusted R-squared: 0.5412
## F-statistic: 7.657 on 14 and 65 DF, p-value: 3.468e-09
#HAQ 2016
#-----Final model according to the backward selection procedure, VIF, and ANOVA-----
final_second_model_2016 <- lm(index ~ age1 + median_income + Public + age1:Public, data = severalStates_2016, weights = w2)
summary(final_second_model_2016)
##
## Call:
## lm(formula = index ~ age1 + median_income + Public + age1:Public,
## data = severalStates_2016, weights = w2)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -3.4715 -1.1115 0.0586 0.9493 3.7886
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 73.47697 2.52092 29.147 < 2e-16 ***
## age1Adults -5.90088 2.72347 -2.167 0.033716 *
## age1Children 19.65421 9.84863 1.996 0.049924 *
## age1Seniors -10.43515 2.75002 -3.795 0.000314 ***
## age1Young Adults -2.01123 3.10782 -0.647 0.519681
## median_income 0.57475 0.10136 5.670 3.06e-07 ***
## Public 0.24335 0.07313 3.328 0.001407 **
## age1Adults:Public -0.38639 0.13645 -2.832 0.006064 **
## age1Children:Public -0.50592 0.21762 -2.325 0.023035 *
## age1Seniors:Public -0.19301 0.07506 -2.571 0.012289 *
## age1Young Adults:Public -0.34900 0.17052 -2.047 0.044495 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.554 on 69 degrees of freedom
## Multiple R-squared: 0.6779, Adjusted R-squared: 0.6312
## F-statistic: 14.52 on 10 and 69 DF, p-value: 1.842e-13
Diagnostics for the FINAL second model
#Shapiro test for normality
shapiro.test(residuals(final_second_model_2010)) #Since the p-value is greater than 0.05, we accept (fail to reject) the null hypothesis that the data follows a normal distribution (normality condition passed)
##
## Shapiro-Wilk normality test
##
## data: residuals(final_second_model_2010)
## W = 0.97951, p-value = 0.2275
shapiro.test(residuals(final_second_model_2016)) #Normality condition passed as well
##
## Shapiro-Wilk normality test
##
## data: residuals(final_second_model_2016)
## W = 0.97943, p-value = 0.2249
#checking the normality condition in the residuals
par(mfrow = c(1, 2))
qqnorm(residuals(final_second_model_2010)); qqline(residuals(final_second_model_2010), col = "red")
qqnorm(residuals(final_second_model_2016)); qqline(residuals(final_second_model_2016), col = "red")
boxplot(residuals(final_second_model_2010))
boxplot(residuals(final_second_model_2016))
boxcox(final_second_model_2010,lambda = seq(-5,2)) #The result indicates a log transformation is NOT necessary since it contains 1
boxcox(final_second_model_2016,lambda = seq(-5,2)) #Same as above
cook <- cooks.distance(final_second_model_2010)
cook1 <- cooks.distance(final_second_model_2016)
#Breusch-Pagan test to detect heteroscedasticity.
bptest(final_second_model_2010, studentize = TRUE)
##
## studentized Breusch-Pagan test
##
## data: final_second_model_2010
## BP = 7.9292, df = 14, p-value = 0.893
bptest(final_second_model_2016, studentize = TRUE)
##
## studentized Breusch-Pagan test
##
## data: final_second_model_2016
## BP = 3.6964, df = 10, p-value = 0.96
#Residuals vs. Fitted
plot(final_second_model_2010, which = 1)
plot(final_second_model_2016, which = 1)
#VIF
vif(final_second_model_2010, type = 'predictor') #what does this mean?
## Warning in vif.lm(final_second_model_2010, type = "predictor"): type = 'predictor' is available only for unweighted linear models;
## type = 'terms' will be used
## GVIF Df GVIF^(1/(2*Df))
## age1 1.593229e+10 4 18.848846
## Total 1.725866e+01 1 4.154354
## Public 1.576329e+02 1 12.555193
## age1:Total 4.927906e+10 4 21.706150
## age1:Public 1.334268e+06 4 5.829823
vif(final_second_model_2016, type = 'predictor') #what does this mean?
## Warning in vif.lm(final_second_model_2016, type = "predictor"): type = 'predictor' is available only for unweighted linear models;
## type = 'terms' will be used
## GVIF Df GVIF^(1/(2*Df))
## age1 7.532016e+04 4 4.070183
## median_income 3.625274e+00 1 1.904015
## Public 7.078375e+01 1 8.413308
## age1:Public 3.996165e+05 4 5.014244
All 50+DC states
healthcare_data_2010 <- clean_healthcare_data %>% filter(year_id == 2010)
healthcare_data_2016 <- clean_healthcare_data %>% filter(year_id == 2016)
#Third model: 2010
#1
third_model_2010 <- lm(index ~ age1*(median_income + Total + Private + Public), data = healthcare_data_2010)
abs_resid3 <- abs(residuals(third_model_2010))
var_model3 <- lm(abs_resid3 ~ median_income + Total + Private + Public, data = healthcare_data_2010)
fitted_var3 <- fitted(var_model3)
w3 <- 1 / (fitted_var3^2)
#2
third_model_2010_wls <- lm(index ~ age1*(median_income + Total + Private + Public), data = healthcare_data_2010, weights = w3)
#Third model: 2016
#1
third_model_2016 <- lm(index ~ age1*(median_income + Total + Private + Public), data = healthcare_data_2016)
abs_resid4 <- abs(residuals(third_model_2016))
var_model4 <- lm(abs_resid4 ~ median_income + Total + Private + Public, data = healthcare_data_2016)
fitted_var4 <- fitted(var_model4)
w4 <- 1 / (fitted_var4^2)
#2
third_model_2016_wls <- lm(index ~ age1*(median_income + Total + Private + Public), data = healthcare_data_2016, weights = w4)
#Plotting the models (by year, all states)
p <- ggplot(healthcare_data_2010, aes(x = age_group_id, y = index, color = location_name, group = location_name)) + geom_line() + geom_point() + labs(title = "Index by Age Group (2010) All",x = "Age Group", y = "Index")
q <- ggplot(healthcare_data_2016, aes(x = age_group_id, y = index, color = location_name, group = location_name)) + geom_line() + geom_point() + labs(title = "Index by Age Group (2016) All", x = "Age Group", y = "Index")
ggplotly(p)
ggplotly(q)
Diagnostics for the initial third model (with BACKWARD SELECTION PROCEDURE and ANOVA)
#Shapiro test for normality
shapiro.test(residuals(third_model_2010_wls))
##
## Shapiro-Wilk normality test
##
## data: residuals(third_model_2010_wls)
## W = 0.97881, p-value = 1.684e-09
shapiro.test(residuals(third_model_2016_wls))
##
## Shapiro-Wilk normality test
##
## data: residuals(third_model_2016_wls)
## W = 0.971, p-value = 1.173e-11
#checking the normality condition in the residuals
par(mfrow = c(1, 2))
qqnorm(residuals(third_model_2010_wls)); qqline(residuals(third_model_2010_wls), col = "red")
qqnorm(residuals(third_model_2016_wls)); qqline(residuals(third_model_2016_wls), col = "red")
boxplot(residuals(third_model_2010_wls))
boxplot(residuals(third_model_2016_wls))
boxcox(third_model_2010_wls,lambda = seq(-5,2)) #The result indicates a log transformation is NOT necessary since it contains 1
boxcox(third_model_2016_wls,lambda = seq(-5,2)) #Same as above
cook2 <- cooks.distance(third_model_2010_wls)
cook3 <- cooks.distance(third_model_2016_wls)
#Breusch-Pagan test to detect heteroscedasticity.
bptest(third_model_2010_wls, studentize = TRUE)
##
## studentized Breusch-Pagan test
##
## data: third_model_2010_wls
## BP = 30.639, df = 24, p-value = 0.1645
bptest(third_model_2016_wls, studentize = TRUE)
##
## studentized Breusch-Pagan test
##
## data: third_model_2016_wls
## BP = 30.199, df = 24, p-value = 0.1782
#Residuals vs. Fitted
plot(third_model_2010_wls, which = 1)
plot(third_model_2016_wls, which = 1)
#VIF
vif(third_model_2010_wls, type = "predictor")
## Warning in vif.lm(third_model_2010_wls, type = "predictor"): type = 'predictor' is available only for unweighted linear models;
## type = 'terms' will be used
## GVIF Df GVIF^(1/(2*Df))
## age1 5.529573e+10 4 22.020970
## median_income 5.634743e+01 1 7.506492
## Total 1.115917e+03 1 33.405342
## Private 5.860186e+02 1 24.207820
## Public 1.210720e+04 1 110.032725
## age1:median_income 2.052222e+07 4 8.204045
## age1:Total 2.835346e+15 4 85.423186
## age1:Private 6.175565e+13 4 52.946174
## age1:Public 6.976211e+10 4 22.670049
vif(third_model_2016_wls, type = "predictor")
## Warning in vif.lm(third_model_2016_wls, type = "predictor"): type = 'predictor' is available only for unweighted linear models;
## type = 'terms' will be used
## GVIF Df GVIF^(1/(2*Df))
## age1 2.675076e+12 4 35.761606
## median_income 7.120983e+01 1 8.438592
## Total 3.381562e+02 1 18.389023
## Private 3.329284e+02 1 18.246325
## Public 9.876862e+03 1 99.382404
## age1:median_income 2.764591e+07 4 8.515376
## age1:Total 4.126888e+15 4 89.526754
## age1:Private 7.365816e+13 4 54.125585
## age1:Public 2.157741e+11 4 26.106576
#Selection procedure
#---- All States year 2010
backward_selection <- step(third_model_2010_wls, direction = "backward", trace = FALSE) #We run this procedure to obtain a parsimonious, hierarchy-respecting model
summary(backward_selection)
##
## Call:
## lm(formula = index ~ age1 * (median_income + Total + Private +
## Public), data = healthcare_data_2010, weights = w3)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -4.5959 -0.8414 0.1087 0.8404 5.8670
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 69.49232 2.07773 33.446 < 2e-16 ***
## age1Adults -10.27080 3.43124 -2.993 0.00285 **
## age1Children 27.17610 11.54857 2.353 0.01886 *
## age1Seniors -14.00990 8.71477 -1.608 0.10832
## age1Young Adults 2.96910 2.97081 0.999 0.31790
## median_income 0.27313 0.10894 2.507 0.01237 *
## Total 0.62911 0.24654 2.552 0.01091 *
## Private -0.40444 0.22940 -1.763 0.07828 .
## Public -0.41759 0.22281 -1.874 0.06127 .
## age1Adults:median_income -0.10248 0.11870 -0.863 0.38818
## age1Children:median_income -0.26143 0.21332 -1.226 0.22074
## age1Seniors:median_income 0.06339 0.11988 0.529 0.59711
## age1Young Adults:median_income 0.15583 0.12419 1.255 0.20994
## age1Adults:Total 0.42848 0.29868 1.435 0.15180
## age1Children:Total -0.23277 0.33170 -0.702 0.48304
## age1Seniors:Total -0.52198 0.26879 -1.942 0.05249 .
## age1Young Adults:Total 0.28321 0.35848 0.790 0.42974
## age1Adults:Private -0.36144 0.28321 -1.276 0.20226
## age1Children:Private 0.06554 0.26806 0.245 0.80690
## age1Seniors:Private 0.50449 0.23072 2.187 0.02906 *
## age1Young Adults:Private -0.45938 0.34243 -1.342 0.18014
## age1Adults:Public -0.54639 0.26065 -2.096 0.03637 *
## age1Children:Public -0.19909 0.26595 -0.749 0.45433
## age1Seniors:Public 0.44746 0.22342 2.003 0.04554 *
## age1Young Adults:Public -0.34067 0.32117 -1.061 0.28915
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.29 on 791 degrees of freedom
## Multiple R-squared: 0.6671, Adjusted R-squared: 0.657
## F-statistic: 66.04 on 24 and 791 DF, p-value: < 2.2e-16
Anova(third_model_2010_wls, type = 3)
## Anova Table (Type III tests)
##
## Response: index
## Sum Sq Df F value Pr(>F)
## (Intercept) 1862.15 1 1118.6514 < 2.2e-16 ***
## age1 39.54 4 5.9381 0.0001038 ***
## median_income 10.46 1 6.2852 0.0123742 *
## Total 10.84 1 6.5114 0.0109054 *
## Private 5.17 1 3.1083 0.0782809 .
## Public 5.85 1 3.5127 0.0612696 .
## age1:median_income 24.99 4 3.7533 0.0049322 **
## age1:Total 44.92 4 6.7458 2.432e-05 ***
## age1:Private 88.08 4 13.2284 1.948e-10 ***
## age1:Public 143.77 4 21.5919 < 2.2e-16 ***
## Residuals 1316.73 791
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#---- All States year 2016
backward_selection <- step(third_model_2016_wls, direction = "backward", trace = FALSE)
summary(backward_selection)
##
## Call:
## lm(formula = index ~ age1 * (median_income + Total + Private +
## Public), data = healthcare_data_2016, weights = w4)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -4.7889 -0.8322 0.1119 0.8444 6.8608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.98552 4.03625 13.623 < 2e-16 ***
## age1Adults -0.84804 5.70830 -0.149 0.881936
## age1Children 59.14223 15.42362 3.835 0.000136 ***
## age1Seniors -33.73231 14.00836 -2.408 0.016267 *
## age1Young Adults 9.98514 5.41634 1.844 0.065627 .
## median_income 0.10899 0.08500 1.282 0.200125
## Total 0.10739 0.18651 0.576 0.564926
## Private 0.24256 0.17247 1.406 0.160004
## Public 0.26367 0.16494 1.599 0.110312
## age1Adults:median_income 0.01312 0.09538 0.138 0.890651
## age1Children:median_income -0.11860 0.15607 -0.760 0.447545
## age1Seniors:median_income 0.02871 0.09558 0.300 0.763978
## age1Young Adults:median_income 0.24600 0.10072 2.442 0.014805 *
## age1Adults:Total 0.79634 0.24204 3.290 0.001046 **
## age1Children:Total -0.10689 0.31499 -0.339 0.734430
## age1Seniors:Total 0.52218 0.23934 2.182 0.029422 *
## age1Young Adults:Total -0.01016 0.28352 -0.036 0.971428
## age1Adults:Private -0.82157 0.23072 -3.561 0.000392 ***
## age1Children:Private -0.42274 0.26896 -1.572 0.116408
## age1Seniors:Private -0.25742 0.17358 -1.483 0.138458
## age1Young Adults:Private -0.21666 0.26694 -0.812 0.417241
## age1Adults:Public -0.97970 0.21378 -4.583 5.33e-06 ***
## age1Children:Public -0.59331 0.26176 -2.267 0.023682 *
## age1Seniors:Public -0.29577 0.16556 -1.787 0.074400 .
## age1Young Adults:Public -0.27531 0.25279 -1.089 0.276454
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.307 on 791 degrees of freedom
## Multiple R-squared: 0.6647, Adjusted R-squared: 0.6545
## F-statistic: 65.34 on 24 and 791 DF, p-value: < 2.2e-16
Anova(third_model_2016_wls, type = 3)
## Anova Table (Type III tests)
##
## Response: index
## Sum Sq Df F value Pr(>F)
## (Intercept) 316.95 1 185.5842 < 2.2e-16 ***
## age1 45.76 4 6.6986 2.648e-05 ***
## median_income 2.81 1 1.6442 0.2001252
## Total 0.57 1 0.3315 0.5649255
## Private 3.38 1 1.9779 0.1600039
## Public 4.36 1 2.5555 0.1103120
## age1:median_income 27.14 4 3.9722 0.0033719 **
## age1:Total 32.93 4 4.8207 0.0007595 ***
## age1:Private 27.90 4 4.0846 0.0027714 **
## age1:Public 51.99 4 7.6102 5.112e-06 ***
## Residuals 1350.90 791
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Final model for the All States
#HAQ 2010
#-----Final model according to the backward selection procedure, VIF, and ANOVA-----
final_third_model_2010 <- lm(index ~ age1*(median_income + Private + Public), data = healthcare_data_2010, weights = w3) #We had to drop Total because of the high multicollinearity with Private and Public
summary(final_third_model_2010)
##
## Call:
## lm(formula = index ~ age1 * (median_income + Private + Public),
## data = healthcare_data_2010, weights = w3)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -5.3167 -0.8885 0.1099 0.9343 5.8362
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.31742 2.01978 35.309 < 2e-16 ***
## age1Adults -7.74367 3.39955 -2.278 0.023000 *
## age1Children 40.97365 7.77157 5.272 1.74e-07 ***
## age1Seniors -7.70183 3.16783 -2.431 0.015266 *
## age1Young Adults 1.90132 2.97708 0.639 0.523232
## median_income 0.30446 0.11208 2.716 0.006742 **
## Private 0.17670 0.02851 6.197 9.21e-10 ***
## Public 0.14649 0.02888 5.072 4.90e-07 ***
## age1Adults:median_income -0.17364 0.12206 -1.423 0.155247
## age1Children:median_income -0.16878 0.20846 -0.810 0.418365
## age1Seniors:median_income 0.04252 0.12299 0.346 0.729677
## age1Young Adults:median_income 0.14598 0.12780 1.142 0.253677
## age1Adults:Private 0.06805 0.05057 1.346 0.178830
## age1Children:Private -0.32457 0.09576 -3.389 0.000735 ***
## age1Seniors:Private -0.06547 0.03646 -1.796 0.072880 .
## age1Young Adults:Private -0.15637 0.04380 -3.570 0.000378 ***
## age1Adults:Public -0.32548 0.06045 -5.384 9.60e-08 ***
## age1Children:Public -0.54299 0.08452 -6.425 2.27e-10 ***
## age1Seniors:Public -0.10160 0.02974 -3.416 0.000667 ***
## age1Young Adults:Public -0.11703 0.06380 -1.834 0.066980 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.336 on 796 degrees of freedom
## Multiple R-squared: 0.6409, Adjusted R-squared: 0.6323
## F-statistic: 74.76 on 19 and 796 DF, p-value: < 2.2e-16
#HAQ 2016
#-----Final model according to the backward selection procedure, VIF, and ANOVA-----
final_third_model_2016 <- lm(index ~ age1*(median_income + Private + Public), data = healthcare_data_2016, weights = w4) #We had to drop Total because of the high multicollinearity with Private and Public
summary(final_third_model_2016)
##
## Call:
## lm(formula = index ~ age1 * (median_income + Private + Public),
## data = healthcare_data_2016, weights = w4)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -4.7890 -0.8343 0.1365 0.8816 7.3793
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.73698 3.93167 14.176 < 2e-16 ***
## age1Adults 1.47430 5.69486 0.259 0.795792
## age1Children 58.40722 13.23000 4.415 1.15e-05 ***
## age1Seniors 21.55160 4.15653 5.185 2.74e-07 ***
## age1Young Adults 9.47108 5.38456 1.759 0.078973 .
## median_income 0.10900 0.08750 1.246 0.213230
## Private 0.33798 0.04915 6.877 1.23e-11 ***
## Public 0.35652 0.03560 10.015 < 2e-16 ***
## age1Adults:median_income 0.04183 0.09805 0.427 0.669754
## age1Children:median_income -0.11859 0.16060 -0.738 0.460447
## age1Seniors:median_income 0.05915 0.09811 0.603 0.546746
## age1Young Adults:median_income 0.25258 0.10261 2.462 0.014044 *
## age1Adults:Private -0.07614 0.07397 -1.029 0.303611
## age1Children:Private -0.51787 0.14919 -3.471 0.000546 ***
## age1Seniors:Private -0.34859 0.05310 -6.564 9.41e-11 ***
## age1Young Adults:Private -0.22198 0.07007 -3.168 0.001593 **
## age1Adults:Public -0.32722 0.06097 -5.367 1.05e-07 ***
## age1Children:Public -0.68585 0.13160 -5.212 2.39e-07 ***
## age1Seniors:Public -0.33316 0.03606 -9.239 < 2e-16 ***
## age1Young Adults:Public -0.28372 0.06089 -4.660 3.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.345 on 796 degrees of freedom
## Multiple R-squared: 0.6425, Adjusted R-squared: 0.6339
## F-statistic: 75.28 on 19 and 796 DF, p-value: < 2.2e-16
Diagnostics for the FINAL third model
#Shapiro test for normality
shapiro.test(residuals(final_third_model_2010))
##
## Shapiro-Wilk normality test
##
## data: residuals(final_third_model_2010)
## W = 0.98287, p-value = 3.531e-08
shapiro.test(residuals(final_third_model_2016))
##
## Shapiro-Wilk normality test
##
## data: residuals(final_third_model_2016)
## W = 0.97579, p-value = 2.2e-10
#checking the normality condition in the residuals
par(mfrow = c(1, 2))
qqnorm(residuals(final_third_model_2010)); qqline(residuals(final_third_model_2010), col = "red")
qqnorm(residuals(final_third_model_2016)); qqline(residuals(final_third_model_2016), col = "red")
boxplot(residuals(final_third_model_2010))
boxplot(residuals(final_third_model_2016))
boxcox(final_third_model_2010,lambda = seq(-5,2)) #The result indicates a log transformation is NOT necessary since it contains 1
boxcox(final_third_model_2016,lambda = seq(-5,2)) #Same as above
cook2 <- cooks.distance(final_third_model_2010)
cook3 <- cooks.distance(final_third_model_2016)
#Breusch-Pagan test to detect heteroscedasticity.
bptest(final_third_model_2010, studentize = TRUE)
##
## studentized Breusch-Pagan test
##
## data: final_third_model_2010
## BP = 31.109, df = 19, p-value = 0.03928
bptest(final_third_model_2016, studentize = TRUE)
##
## studentized Breusch-Pagan test
##
## data: final_third_model_2016
## BP = 31.302, df = 19, p-value = 0.0374
#Residuals vs. Fitted
plot(final_third_model_2010, which = 1)
plot(final_third_model_2016, which = 1)
#VIF
vif(final_third_model_2010, type = "predictor")
## Warning in vif.lm(final_third_model_2010, type = "predictor"): type = 'predictor' is available only for unweighted linear models;
## type = 'terms' will be used
## GVIF Df GVIF^(1/(2*Df))
## age1 1.939159e+09 4 14.486106
## median_income 5.563153e+01 1 7.458655
## Private 8.444504e+00 1 2.905943
## Public 1.897624e+02 1 13.775427
## age1:median_income 1.658825e+07 4 7.988681
## age1:Private 7.771313e+08 4 12.921466
## age1:Public 2.627519e+06 4 6.345170
vif(final_third_model_2016, type = "predictor")
## Warning in vif.lm(final_third_model_2016, type = "predictor"): type = 'predictor' is available only for unweighted linear models;
## type = 'terms' will be used
## GVIF Df GVIF^(1/(2*Df))
## age1 5.697563e+10 4 22.103505
## median_income 7.120983e+01 1 8.438592
## Private 2.551138e+01 1 5.050880
## Public 4.342114e+02 1 20.837741
## age1:median_income 2.535837e+07 4 8.423938
## age1:Private 2.445100e+10 4 19.885525
## age1:Public 2.795084e+07 4 8.527061
#AIC
AIC_2010 <- step(final_third_model_2010, direction="both", criterion='AIC'); summary(AIC_2010)
## Start: AIC=492.35
## index ~ age1 * (median_income + Private + Public)
##
## Df Sum of Sq RSS AIC
## <none> 1420.5 492.35
## - age1:median_income 4 35.120 1455.6 504.28
## - age1:Private 4 54.599 1475.1 515.13
## - age1:Public 4 108.952 1529.5 544.65
##
## Call:
## lm(formula = index ~ age1 * (median_income + Private + Public),
## data = healthcare_data_2010, weights = w3)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -5.3167 -0.8885 0.1099 0.9343 5.8362
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.31742 2.01978 35.309 < 2e-16 ***
## age1Adults -7.74367 3.39955 -2.278 0.023000 *
## age1Children 40.97365 7.77157 5.272 1.74e-07 ***
## age1Seniors -7.70183 3.16783 -2.431 0.015266 *
## age1Young Adults 1.90132 2.97708 0.639 0.523232
## median_income 0.30446 0.11208 2.716 0.006742 **
## Private 0.17670 0.02851 6.197 9.21e-10 ***
## Public 0.14649 0.02888 5.072 4.90e-07 ***
## age1Adults:median_income -0.17364 0.12206 -1.423 0.155247
## age1Children:median_income -0.16878 0.20846 -0.810 0.418365
## age1Seniors:median_income 0.04252 0.12299 0.346 0.729677
## age1Young Adults:median_income 0.14598 0.12780 1.142 0.253677
## age1Adults:Private 0.06805 0.05057 1.346 0.178830
## age1Children:Private -0.32457 0.09576 -3.389 0.000735 ***
## age1Seniors:Private -0.06547 0.03646 -1.796 0.072880 .
## age1Young Adults:Private -0.15637 0.04380 -3.570 0.000378 ***
## age1Adults:Public -0.32548 0.06045 -5.384 9.60e-08 ***
## age1Children:Public -0.54299 0.08452 -6.425 2.27e-10 ***
## age1Seniors:Public -0.10160 0.02974 -3.416 0.000667 ***
## age1Young Adults:Public -0.11703 0.06380 -1.834 0.066980 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.336 on 796 degrees of freedom
## Multiple R-squared: 0.6409, Adjusted R-squared: 0.6323
## F-statistic: 74.76 on 19 and 796 DF, p-value: < 2.2e-16
AIC_2016 <- step(final_third_model_2016, direction="both", criterion='AIC'); summary(AIC_2016)
## Start: AIC=503.77
## index ~ age1 * (median_income + Private + Public)
##
## Df Sum of Sq RSS AIC
## <none> 1440.5 503.77
## - age1:median_income 4 25.871 1466.4 510.30
## - age1:Private 4 111.710 1552.2 556.72
## - age1:Public 4 170.443 1611.0 587.02
##
## Call:
## lm(formula = index ~ age1 * (median_income + Private + Public),
## data = healthcare_data_2016, weights = w4)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -4.7890 -0.8343 0.1365 0.8816 7.3793
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.73698 3.93167 14.176 < 2e-16 ***
## age1Adults 1.47430 5.69486 0.259 0.795792
## age1Children 58.40722 13.23000 4.415 1.15e-05 ***
## age1Seniors 21.55160 4.15653 5.185 2.74e-07 ***
## age1Young Adults 9.47108 5.38456 1.759 0.078973 .
## median_income 0.10900 0.08750 1.246 0.213230
## Private 0.33798 0.04915 6.877 1.23e-11 ***
## Public 0.35652 0.03560 10.015 < 2e-16 ***
## age1Adults:median_income 0.04183 0.09805 0.427 0.669754
## age1Children:median_income -0.11859 0.16060 -0.738 0.460447
## age1Seniors:median_income 0.05915 0.09811 0.603 0.546746
## age1Young Adults:median_income 0.25258 0.10261 2.462 0.014044 *
## age1Adults:Private -0.07614 0.07397 -1.029 0.303611
## age1Children:Private -0.51787 0.14919 -3.471 0.000546 ***
## age1Seniors:Private -0.34859 0.05310 -6.564 9.41e-11 ***
## age1Young Adults:Private -0.22198 0.07007 -3.168 0.001593 **
## age1Adults:Public -0.32722 0.06097 -5.367 1.05e-07 ***
## age1Children:Public -0.68585 0.13160 -5.212 2.39e-07 ***
## age1Seniors:Public -0.33316 0.03606 -9.239 < 2e-16 ***
## age1Young Adults:Public -0.28372 0.06089 -4.660 3.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.345 on 796 degrees of freedom
## Multiple R-squared: 0.6425, Adjusted R-squared: 0.6339
## F-statistic: 75.28 on 19 and 796 DF, p-value: < 2.2e-16
plot(AIC_2010, main = "AIC for HAQ Scores 2010", col="blue")
plot(AIC_2016, main = "AIC for HAQ Scores 2016", col="blue")
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced